Overview

Dataset statistics

Number of variables13
Number of observations359392
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory35.6 MiB
Average record size in memory104.0 B

Variable types

NUM8
CAT4
DATE1

Warnings

Cost of Trip is highly correlated with KM TravelledHigh correlation
KM Travelled is highly correlated with Cost of TripHigh correlation
df_index has unique values Unique
Transaction ID has unique values Unique

Reproduction

Analysis started2021-10-07 06:35:54.414614
Analysis finished2021-10-07 06:37:08.106526
Duration1 minute and 13.69 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct359392
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179695.5
Minimum0
Maximum359391
Zeros1
Zeros (%)< 0.1%
Memory size2.7 MiB
2021-10-07T09:37:08.681723image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile17969.55
Q189847.75
median179695.5
Q3269543.25
95-th percentile341421.45
Maximum359391
Range359391
Interquartile range (IQR)179695.5

Descriptive statistics

Standard deviation103747.6783
Coefficient of variation (CV)0.5773526789
Kurtosis-1.2
Mean179695.5
Median Absolute Deviation (MAD)89848
Skewness4.59274699e-17
Sum6.458112514e+10
Variance1.076358075e+10
MonotocityNot monotonic
2021-10-07T09:37:09.225793image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01< 0.1%
 
1222571< 0.1%
 
1058811< 0.1%
 
1038321< 0.1%
 
1263591< 0.1%
 
1243101< 0.1%
 
1304531< 0.1%
 
1284041< 0.1%
 
1181631< 0.1%
 
1161141< 0.1%
 
Other values (359382)359382> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
3593911< 0.1%
 
3593901< 0.1%
 
3593891< 0.1%
 
3593881< 0.1%
 
3593871< 0.1%
 

Transaction ID
Real number (ℝ≥0)

UNIQUE

Distinct359392
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10220761.19
Minimum10000011
Maximum10440107
Zeros0
Zeros (%)0.0%
Memory size2.7 MiB
2021-10-07T09:37:09.971634image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum10000011
5-th percentile10022854.55
Q110110809.75
median10221035.5
Q310330937.25
95-th percentile10418091.45
Maximum10440107
Range440096
Interquartile range (IQR)220127.5

Descriptive statistics

Standard deviation126805.8037
Coefficient of variation (CV)0.01240668884
Kurtosis-1.19892498
Mean10220761.19
Median Absolute Deviation (MAD)110064
Skewness7.232656511e-05
Sum3.673259804e+12
Variance1.607971186e+10
MonotocityNot monotonic
2021-10-07T09:37:10.322970image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
100004031< 0.1%
 
102495101< 0.1%
 
102351831< 0.1%
 
102331341< 0.1%
 
102392771< 0.1%
 
102372281< 0.1%
 
102269871< 0.1%
 
102310811< 0.1%
 
102290321< 0.1%
 
102515591< 0.1%
 
Other values (359382)359382> 99.9%
 
ValueCountFrequency (%) 
100000111< 0.1%
 
100000121< 0.1%
 
100000131< 0.1%
 
100000141< 0.1%
 
100000151< 0.1%
 
ValueCountFrequency (%) 
104401071< 0.1%
 
104401061< 0.1%
 
104401051< 0.1%
 
104401041< 0.1%
 
104401011< 0.1%
 
Distinct1095
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
Minimum2016-01-02 00:00:00
Maximum2018-12-31 00:00:00
2021-10-07T09:37:10.710849image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:37:11.052716image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Company
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
Yellow Cab
274681 
Pink Cab
84711 
ValueCountFrequency (%) 
Yellow Cab27468176.4%
 
Pink Cab8471123.6%
 
2021-10-07T09:37:11.592190image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-10-07T09:37:11.801986image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:37:12.038293image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length9.528587169
Min length8

City
Categorical

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
NEW YORK NY
99885 
CHICAGO IL
56625 
LOS ANGELES CA
48033 
WASHINGTON DC
43737 
BOSTON MA
29692 
Other values (14)
81420 
ValueCountFrequency (%) 
NEW YORK NY9988527.8%
 
CHICAGO IL5662515.8%
 
LOS ANGELES CA4803313.4%
 
WASHINGTON DC4373712.2%
 
BOSTON MA296928.3%
 
SAN DIEGO CA204885.7%
 
SILICON VALLEY85192.4%
 
SEATTLE WA79972.2%
 
ATLANTA GA75572.1%
 
DALLAS TX70172.0%
 
Other values (9)298428.3%
 
2021-10-07T09:37:12.397912image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-10-07T09:37:12.704021image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length14
Median length11
Mean length11.29946409
Min length8

KM Travelled
Real number (ℝ≥0)

HIGH CORRELATION

Distinct874
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.56725408
Minimum1.9
Maximum48
Zeros0
Zeros (%)0.0%
Memory size2.7 MiB
2021-10-07T09:37:13.054816image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1.9
5-th percentile3.57
Q112
median22.44
Q332.96
95-th percentile42
Maximum48
Range46.1
Interquartile range (IQR)20.96

Descriptive statistics

Standard deviation12.23352593
Coefficient of variation (CV)0.5420919125
Kurtosis-1.126875356
Mean22.56725408
Median Absolute Deviation (MAD)10.45
Skewness0.05577890774
Sum8110490.58
Variance149.6591566
MonotocityNot monotonic
2021-10-07T09:37:13.432800image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
33.615360.4%
 
2410800.3%
 
22.810750.3%
 
35.710690.3%
 
16.810650.3%
 
37.4410620.3%
 
39.610560.3%
 
28.089720.3%
 
21.857690.2%
 
187540.2%
 
Other values (864)34895497.1%
 
ValueCountFrequency (%) 
1.93390.1%
 
1.923750.1%
 
1.943290.1%
 
1.963830.1%
 
1.983740.1%
 
ValueCountFrequency (%) 
483660.1%
 
47.63810.1%
 
47.23780.1%
 
46.87370.2%
 
46.413800.1%
 

Price Charged
Real number (ℝ≥0)

Distinct99176
Distinct (%)27.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean423.4433113
Minimum15.6
Maximum2048.03
Zeros0
Zeros (%)0.0%
Memory size2.7 MiB
2021-10-07T09:37:13.892858image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum15.6
5-th percentile63.42
Q1206.4375
median386.36
Q3583.66
95-th percentile944.89
Maximum2048.03
Range2032.43
Interquartile range (IQR)377.2225

Descriptive statistics

Standard deviation274.3789114
Coefficient of variation (CV)0.6479708243
Kurtosis0.7476354732
Mean423.4433113
Median Absolute Deviation (MAD)187.22
Skewness0.8737614916
Sum152182138.5
Variance75283.78705
MonotocityNot monotonic
2021-10-07T09:37:14.286362image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
298.3218< 0.1%
 
191.2718< 0.1%
 
198.817< 0.1%
 
181.5917< 0.1%
 
216.3717< 0.1%
 
115.5317< 0.1%
 
79.3816< 0.1%
 
264.8315< 0.1%
 
399.4115< 0.1%
 
248.4115< 0.1%
 
Other values (99166)359227> 99.9%
 
ValueCountFrequency (%) 
15.61< 0.1%
 
15.751< 0.1%
 
16.381< 0.1%
 
16.531< 0.1%
 
16.761< 0.1%
 
ValueCountFrequency (%) 
2048.031< 0.1%
 
2016.71< 0.1%
 
2013.951< 0.1%
 
1993.831< 0.1%
 
1981.051< 0.1%
 

Cost of Trip
Real number (ℝ≥0)

HIGH CORRELATION

Distinct16291
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean286.1901128
Minimum19
Maximum691.2
Zeros0
Zeros (%)0.0%
Memory size2.7 MiB
2021-10-07T09:37:14.698863image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum19
5-th percentile46.224
Q1151.2
median282.48
Q3413.6832
95-th percentile544.3632
Maximum691.2
Range672.2
Interquartile range (IQR)262.4832

Descriptive statistics

Standard deviation157.9936612
Coefficient of variation (CV)0.5520584188
Kurtosis-1.012232752
Mean286.1901128
Median Absolute Deviation (MAD)131.232
Skewness0.1379580609
Sum102854437
Variance24961.99696
MonotocityNot monotonic
2021-10-07T09:37:15.017598image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
362.881860.1%
 
479.8081840.1%
 
471.7441800.1%
 
205.632178< 0.1%
 
411.264166< 0.1%
 
336.96166< 0.1%
 
428.4164< 0.1%
 
241.92161< 0.1%
 
423.36161< 0.1%
 
443.52160< 0.1%
 
Other values (16281)35768699.5%
 
ValueCountFrequency (%) 
192< 0.1%
 
19.194< 0.1%
 
19.24< 0.1%
 
19.382< 0.1%
 
19.3921< 0.1%
 
ValueCountFrequency (%) 
691.29< 0.1%
 
685.4429< 0.1%
 
679.72814< 0.1%
 
679.6833< 0.1%
 
674.01634< 0.1%
 

Customer ID
Real number (ℝ≥0)

Distinct46148
Distinct (%)12.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19191.65212
Minimum1
Maximum60000
Zeros0
Zeros (%)0.0%
Memory size2.7 MiB
2021-10-07T09:37:15.404380image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile544
Q12705
median7459
Q336078
95-th percentile58189
Maximum60000
Range59999
Interquartile range (IQR)33373

Descriptive statistics

Standard deviation21012.41246
Coefficient of variation (CV)1.094872517
Kurtosis-0.885062488
Mean19191.65212
Median Absolute Deviation (MAD)6362
Skewness0.880030242
Sum6897326237
Variance441521477.5
MonotocityNot monotonic
2021-10-07T09:37:15.776161image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
49454< 0.1%
 
293953< 0.1%
 
276651< 0.1%
 
107051< 0.1%
 
12650< 0.1%
 
94450< 0.1%
 
85850< 0.1%
 
180350< 0.1%
 
106750< 0.1%
 
162850< 0.1%
 
Other values (46138)35888399.9%
 
ValueCountFrequency (%) 
129< 0.1%
 
240< 0.1%
 
346< 0.1%
 
426< 0.1%
 
531< 0.1%
 
ValueCountFrequency (%) 
6000018< 0.1%
 
599998< 0.1%
 
599989< 0.1%
 
5999710< 0.1%
 
599964< 0.1%
 

Payment_Mode
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
Card
215504 
Cash
143888 
ValueCountFrequency (%) 
Card21550460.0%
 
Cash14388840.0%
 
2021-10-07T09:37:16.176515image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-10-07T09:37:16.377977image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:37:16.642830image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length4
Median length4
Mean length4
Min length4

Gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
Male
205912 
Female
153480 
ValueCountFrequency (%) 
Male20591257.3%
 
Female15348042.7%
 
2021-10-07T09:37:17.170606image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-10-07T09:37:17.389536image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:37:17.680769image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length4
Mean length4.854109162
Min length4

Age
Real number (ℝ≥0)

Distinct48
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.33670477
Minimum18
Maximum65
Zeros0
Zeros (%)0.0%
Memory size2.7 MiB
2021-10-07T09:37:18.091694image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile19
Q125
median33
Q342
95-th percentile61
Maximum65
Range47
Interquartile range (IQR)17

Descriptive statistics

Standard deviation12.59423447
Coefficient of variation (CV)0.3564065906
Kurtosis-0.4583967778
Mean35.33670477
Median Absolute Deviation (MAD)8
Skewness0.6853387826
Sum12699729
Variance158.6147419
MonotocityNot monotonic
2021-10-07T09:37:18.545580image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=48)
ValueCountFrequency (%) 
23123273.4%
 
20122293.4%
 
27120303.3%
 
25119733.3%
 
32119593.3%
 
34118253.3%
 
39117983.3%
 
22117963.3%
 
26116553.2%
 
19115913.2%
 
Other values (38)24020966.8%
 
ValueCountFrequency (%) 
18108463.0%
 
19115913.2%
 
20122293.4%
 
21114313.2%
 
22117963.3%
 
ValueCountFrequency (%) 
6533790.9%
 
6439081.1%
 
6337331.0%
 
6235301.0%
 
6143611.2%
 

Income (USD/Month)
Real number (ℝ≥0)

Distinct22725
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15048.82294
Minimum2000
Maximum35000
Zeros0
Zeros (%)0.0%
Memory size2.7 MiB
2021-10-07T09:37:18.938632image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum2000
5-th percentile3245
Q18424
median14685
Q321035
95-th percentile29659
Maximum35000
Range33000
Interquartile range (IQR)12611

Descriptive statistics

Standard deviation7969.409482
Coefficient of variation (CV)0.529570287
Kurtosis-0.6604857162
Mean15048.82294
Median Absolute Deviation (MAD)6304
Skewness0.3095622398
Sum5408426573
Variance63511487.49
MonotocityNot monotonic
2021-10-07T09:37:19.414964image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20884134< 0.1%
 
8899133< 0.1%
 
22525129< 0.1%
 
16512121< 0.1%
 
16137118< 0.1%
 
9797116< 0.1%
 
16289116< 0.1%
 
21045114< 0.1%
 
8672112< 0.1%
 
13413111< 0.1%
 
Other values (22715)35818899.7%
 
ValueCountFrequency (%) 
20009< 0.1%
 
20011< 0.1%
 
20022< 0.1%
 
20038< 0.1%
 
20046< 0.1%
 
ValueCountFrequency (%) 
350001< 0.1%
 
3499615< 0.1%
 
349954< 0.1%
 
3498930< 0.1%
 
3498516< 0.1%
 

Interactions

2021-10-07T09:36:25.525619image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:26.190828image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:26.787232image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:27.385148image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:27.999502image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:28.589923image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:29.211263image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:29.811656image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:30.445493image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:31.041918image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:31.755027image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:32.313588image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:32.891046image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:33.459049image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:34.035494image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:34.607966image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:35.186419image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:35.765908image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:36.329405image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:36.916833image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:37.515999image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:38.069091image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:38.640583image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:39.185674image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:39.753148image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:40.376452image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:40.986820image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:41.594739image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:42.230064image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:42.840437image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:43.463768image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:44.055224image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:44.668580image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:45.296420image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:45.911771image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:46.517771image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:47.245899image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:47.898207image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:48.592392image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:49.207236image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:49.833563image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:50.381097image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:51.137606image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:51.706585image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:52.324447image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:52.873976image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:53.466913image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:54.026412image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:54.616867image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:55.156427image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:55.701022image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:56.222618image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:56.781653image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:57.303778image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:57.884225image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:58.397847image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:58.961374image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:36:59.696403image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:37:00.258898image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:37:00.894233image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:37:01.647217image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:37:02.277574image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:37:02.943420image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:37:03.518399image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-10-07T09:37:19.869748image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-07T09:37:20.440017image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-07T09:37:21.003043image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-07T09:37:21.576056image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-10-07T09:37:22.119618image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-10-07T09:37:04.841468image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-07T09:37:06.162204image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Sample

First rows

df_indexTransaction IDDate of TravelCompanyCityKM TravelledPrice ChargedCost of TripCustomer IDPayment_ModeGenderAgeIncome (USD/Month)
010422100008452016-01-02Yellow CabNEW YORK NY17.92561.71253.74729CardMale3221212
114242100009612016-01-02Yellow CabNEW YORK NY19.04634.46253.612885CardMale1919765
213252100009292016-01-02Yellow CabNEW YORK NY37.241065.31536.2560439CashMale225494
311247100008692016-01-02Yellow CabNEW YORK NY3.06104.7036.7200475CashMale369959
42025100001452016-01-02Pink CabNEW YORK NY2.1037.1821.4200502CashMale2815285
52162100001492016-01-02Pink CabNEW YORK NY32.64498.60349.2480533CardMale5215974
614700100009752016-01-02Yellow CabNEW YORK NY37.121238.35507.8016573CardMale342589
79564100008182016-01-02Yellow CabNEW YORK NY27.30810.52343.9800818CardMale188653
89348100008122016-01-02Yellow CabNEW YORK NY5.82171.7676.8240901CashMale3620574
910359100008432016-01-02Yellow CabNEW YORK NY5.45175.8769.3240957CardMale614347

Last rows

df_indexTransaction IDDate of TravelCompanyCityKM TravelledPrice ChargedCost of TripCustomer IDPayment_ModeGenderAgeIncome (USD/Month)
359382177132104341722018-12-31Yellow CabBOSTON MA4.5269.0162.918458205CardMale4121247
359383104755104377482018-12-31Yellow CabBOSTON MA19.80286.36237.600058809CashFemale4219371
359384307802104342242018-12-31Yellow CabBOSTON MA30.07446.61433.008058956CardMale3924646
359385248975104378142018-12-31Yellow CabBOSTON MA17.10238.07240.084059185CardFemale4211396
359386152491104342882018-12-31Yellow CabBOSTON MA2.2631.3728.204859187CashFemale5213751
359387228341104331282018-12-31Pink CabBOSTON MA29.97390.42317.682059274CardFemale2522928
359388321484104378172018-12-31Yellow CabBOSTON MA38.85504.11540.792059494CashFemale3517699
359389306342104331312018-12-31Pink CabBOSTON MA27.27370.20324.513059768CashFemale2524526
35939021766104377322018-12-31Yellow CabBOSTON MA25.30362.48352.176059925CashMale3624313
359391164871104366962018-12-31Pink CabBOSTON MA27.55377.85330.600060000CashFemale2720303